Tabular Q solution example #125

JD-ETH · 2021-03-10T22:45:35Z

@ChrisCummins
A working example with tabular Q-learning.

As this is my first real PR please be harsh about formatting, logging, commenting, styling, and point me to guidelines that I have missed. Thanks!

ChrisCummins

This really nice @JD-ETH, thanks a lot! Having a value-iteration baseline is a very welcome addition.

In general the implementation is solid, but since you asked, I have gone through and left a very pedantic review 😃 Don't worry, this level of scrutiny is not normal! Most things are small nitpicks or bikeshedding. The most important change I'd love to see is to bring the level of commenting up to the point of this being a reasonably-standalone explanation of the technique for a general non-expert audience.

Leave a comment here once you want me to take another look.

Cheers,
Chris

examples/brute_force.py

examples/tabular_q.py

examples/brute_force.py

ChrisCummins · 2021-03-11T19:00:51Z

Oo one thing I forgot - please add a test 😊

Given that this isn't core API, a simple smoke test using flags for quick run like --episodes=5 would be sufficient. Take a look at the actor critic smoke test as an example: https://github.com/facebookresearch/CompilerGym/blob/development/examples/actor_critic_test.py

You will then need to add bazel definitions so that the test is run on make test: https://github.com/facebookresearch/CompilerGym/blob/development/examples/BUILD#L8-L25

EDIT: Also worth noting that this test will fail until this PR is rebased on #126.

JD-ETH · 2021-03-16T19:15:16Z

@ChrisCummins I'm ready for the next round.

ChrisCummins

Hi @JD-ETH, thanks a lot for making those changes, I really appreciate the extra docs and the test!

This LGTM and feel free to merge it when you're ready. You may want to squash the history before merging into smaller atomic commits, perhaps one for the brute_force fix and the other for the tabulary_q, but that's not essential so I'll leave it up to your discretion :) Thanks for adding this new baseline!

Cheers,
Chris

examples/tabular_q.py

Co-authored-by: Chris Cummins <chrisc.101@gmail.com>

examples/tabular_q.py

…rGym into feature/tabular-q

This reverts commit 6b9dbd1.

JD-ETH · 2021-03-20T17:33:55Z

@ChrisCummins Now i see CI is failing from this PR, sorry for merging it before the CI tests finished. I am not sure I understand what failed in the test though, would you give some pointers?

ChrisCummins · 2021-03-22T10:53:34Z

@ChrisCummins Now i see CI is failing from this PR, sorry for merging it before the CI tests finished. I am not sure I understand what failed in the test though, would you give some pointers?

Thanks for checking. It's a false positive, nothing to worry about :). See #144.

JD-at-work added 9 commits March 4, 2021 13:26

format

9a6ce8f

format

44ea8c0

before pull

0a24c0a

Merge branch 'development' into feature/tabular-q

43d744d

running but fails to converge due to connection issues

ceb2ab4

compare with brute force

af2c907

Merge branch 'development' into feature/tabular-q

ed1944d

comparable

8df14ba

cleanup

700874d

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 10, 2021

ChrisCummins requested changes Mar 11, 2021

View reviewed changes

JD-at-work added 5 commits March 16, 2021 16:57

formatted

da07181

Merge branch 'development' into feature/tabular-q

b5bdb31

bugfix

2ba7147

CI fails to parse tuple

76df0ac

test passed

7e06df7

ChrisCummins approved these changes Mar 16, 2021

View reviewed changes

examples/tabular_q.py Outdated Show resolved Hide resolved

Fixed usage string

2eefc3d

Co-authored-by: Chris Cummins <chrisc.101@gmail.com>

benoitsteiner reviewed Mar 17, 2021

View reviewed changes

examples/tabular_q.py Outdated Show resolved Hide resolved

JD-at-work added 2 commits March 20, 2021 17:11

init table only for visited states

9a22489

Merge branch 'feature/tabular-q' of https://github.com/JD-ETH/Compile…

d73384c

…rGym into feature/tabular-q

JD-ETH merged commit 6b9dbd1 into facebookresearch:development Mar 20, 2021

JD-ETH added a commit that referenced this pull request Mar 20, 2021

Revert "Tabular Q solution example (#125)"

c182257

This reverts commit 6b9dbd1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tabular Q solution example #125

Tabular Q solution example #125

JD-ETH commented Mar 10, 2021

ChrisCummins left a comment •

edited

Loading

ChrisCummins commented Mar 11, 2021 •

edited

Loading

JD-ETH commented Mar 16, 2021

ChrisCummins left a comment

JD-ETH commented Mar 20, 2021

ChrisCummins commented Mar 22, 2021

Tabular Q solution example #125

Tabular Q solution example #125

Conversation

JD-ETH commented Mar 10, 2021

ChrisCummins left a comment • edited Loading

Choose a reason for hiding this comment

ChrisCummins commented Mar 11, 2021 • edited Loading

JD-ETH commented Mar 16, 2021

ChrisCummins left a comment

Choose a reason for hiding this comment

JD-ETH commented Mar 20, 2021

ChrisCummins commented Mar 22, 2021

ChrisCummins left a comment •

edited

Loading

ChrisCummins commented Mar 11, 2021 •

edited

Loading